Auditing and Maintaining Provenance in Software Packages
نویسندگان
چکیده
Science projects are increasingly investing in computational reproducibility. Constructing software pipelines to demonstrate reproducibility is also becoming increasingly common. To aid the process of constructing pipelines, science project members often adopt reproducible methods and tools. CDE is a software packaging tool for reproducible research that encapsulates source code, datasets and environments. However, CDE is not amenable for creating software pipelines that demonstrate computational reproducibility. In this work, we propose software provenance to be included as part of CDE so that resulting provenanceincluded CDE packages can be easily used for creating software pipelines. We describe provenance attributes that must be included and how they can be efficiently stored in a light-weight CDE package. Further we show how a provenance in a package can be used for creating software pipelines and maintained as new packages are created. We experimentally evaluate the overhead of auditing and maintaining provenance and compare with heavy weight approaches for reproducibility such as virtualization. Our experiments indicate minimal overheads.
منابع مشابه
Transparent Web Service Auditing via Network Provenance Functions
Detecting and explaining the nature of attacks in distributed web services is often difficult – determining the nature of suspicious activity requires following the trail of an attacker through a chain of heterogeneous software components including load balancers, proxies, worker nodes, and storage services. Unfortunately, existing forensic solutions cannot provide the necessary context to link...
متن کاملSPADE: Support for Provenance Auditing in Distributed Environments
SPADE is an open source software infrastructure for data provenance collection and management. The underlying data model used throughout the system is graph-based, consisting of vertices and directed edges that are modeled after the node and relationship types described in the Open Provenance Model. The system has been designed to decouple the collection, storage, and querying of provenance met...
متن کاملFormal Foundations of Reenactment and Transaction Provenance
Provenance is essential for auditing, data debugging, understanding transformations, and many additional use cases. All these use cases would benefit from provenance for transactional updates. We present a provenance model for snapshot isolation transactions extending the semiring framework with version annotations and updates. Based on this model, we present the first solution for computing th...
متن کاملProvenance-Based Auditing of Private Data Use
Across the world, organizations are required to comply with regulatory frameworks dictating how to manage personal information. Despite these, several cases of data leaks and exposition of private data to unauthorized recipients have been publicly and widely advertised. For authorities and system administrators to check compliance to regulations, auditing of private data processing becomes cruc...
متن کاملIntermediate Notation for Provenance and Workflow Reproducibility
We present a technique to capture retrospective provenance across a number of tools in a statistical software suite. Our goal is to facilitate portability of processes between the tools to enhance usability and to support reproducibility. We describe an intermediate notation to aid runtime capture of provenance and demonstrate conversion to an executable and editable workflow. The notation is a...
متن کامل